Overview

This project investigates Symbiodinium communities in scleractinian and milleporine corals sampled at various sites on the north and south shores of St. John, US Virgin Islands in August 2012. The ITS2 region of nrDNA isolated from coral samples was amplified and sequenced by paired-end 300 bp reads on an Illumina MiSeq platform. Sequence data is processed through a custom bioinformatic pipeline and resulting OTU tables are statistically analyzed to ask questions regarding diversity and community structure of Symbiodinium within and across coral species and between shores. The major aims of this work are to:

  1. characterize Symbiodinium communities in a range of coral species using high-throughput sequencing (many for the first time)
  2. forward novel bioinformatic (within-sample clustering) and statistical (beta diversity/distance to centroid, network analysis) approaches for the use of ITS2 data in Symbiodinium ecology.
Sampling locations in St. John

Sampling locations in St. John

Bioinformatics

A custom pipeline is used here to process ITS2 sequence data. Briefly, paired reads were merged using illumina-utils and filtered to keep only those with 3 mismatches or less. Chimeras were then removed by usearch, and singletons were removed. From here, sequence data was processed using each of three approaches:

  1. 100% OTU clustering. All ITS2 sequences from all samples were clustered at 100% identity using uclust and assigned taxonomy by BLAST to the custom reference database.
  2. 97% OTU clustering. All ITS2 sequences from all samples were clustered at 97% identity using uclust and the most abundant sequence within each OTU was selected as the representative sequence and assigned taxonomy by BLAST to the custom reference database.
  3. 97% OTU clustering within samples. Sequences from each individual sample were clustered separately at 97% similarity and the most abundant sequence from each cluster was selected as the representative sequence and assigned taxonomy by BLAST to the custom reference database. OTUs with identical representative sequences were then merged across samples to create a global OTU table. By clustering at 97% within samples, as opposed to across the entire dataset, OTUs from different samples that have different representative sequences (i.e., different most abundant sequence within the 97% cluster) will remain separate even if these sequences are >97% similar to each other. This approach is used because Symbiodinium ITS2 sequences are know to be multi-copy and intragenomically variable, and distinct Symbiodinium types may contain some of the same ITS2 sequence variants in different relative proportions. Therefore, selecting the most abundant sequence variant within a sample prevents a potentially distinct Symbiodinium type from being collapsed into an OTU with a different representative sequence that happens to be more abundant in other samples within the dataset. For more discussion on this approach, see this entry in my lab notebook.

OTU tables and taxonomic assignments from each of these bioinformatic approaches are used in comparative downstream analyses.

Data pre-processing

Import dataset

The phyloseq package is used here to combine the OTU table, taxonomic assignments, and sample metadata into a single R data object (class ‘phyloseq’) to facilitate downstream analyses.

# Import OTU tables
OTU97 <- otu_table(read.table("data/97_otus.tsv", header=T, check.names=F, row.names=1,
                              skip=1, sep="\t", comment.char=""), taxa_are_rows=T)
OTU97bs <- otu_table(read.table("data/97_otus_bysample.tsv", header=T, check.names=F, row.names=1,
                              skip=1, sep="\t", comment.char=""), taxa_are_rows=T)
OTU100 <- otu_table(read.table("data/100_otus.tsv", header=T, check.names=F, row.names=1,
                               skip=1, sep="\t", comment.char=""), taxa_are_rows=T)
# Import taxonomy
TAX97 <- tax_table(as.matrix(read.table("data/97_otus_tax_assignments.txt", sep="\t", row.names=2,
                                        col.names=c("Clade", "OTU.ID", "Taxonomy", "Eval", "Subtype"))))
TAX97bs <- tax_table(as.matrix(read.table("data/97_otus_bysample_tax_assignments.txt", sep="\t", row.names=2,
                                        col.names=c("Clade", "OTU.ID", "Taxonomy", "Eval", "Subtype"))))
TAX100 <- tax_table(as.matrix(read.table("data/100_otus_tax_assignments.txt", sep="\t", row.names=2,
                                         col.names=c("Clade", "OTU.ID", "Taxonomy", "Eval", "Subtype"))))
# Import sample data
SAM <- sample_data(read.delim("data/mapping_file.txt", sep="\t", header=T, row.names=1))
# Build phyloseq objects
phy97 <- phyloseq(OTU97, TAX97, SAM)
phy97bs <- phyloseq(OTU97bs, TAX97bs, SAM)
phy100 <- phyloseq(OTU100, TAX100, SAM)

Filter dataset

# Filter OTUs by minimum count
# Set threshold count
n <- 10
# Identify OTUs below threshold count
taxa97 <- taxa_sums(phy97)[which(taxa_sums(phy97) >= n)]
taxa97bs <- taxa_sums(phy97bs)[which(taxa_sums(phy97bs) >= n)]  
taxa100 <- taxa_sums(phy100)[which(taxa_sums(phy100) >= n)]
# Remove taxa below threshold count
phy97.f <- prune_taxa(names(taxa97), phy97)
phy97bs.f <- prune_taxa(names(taxa97bs), phy97bs)
phy100.f <- prune_taxa(names(taxa100), phy100)

# Filter samples by minimum count
# Set threshold number of reads
sn <- 200
# Remove samples with fewer reads than threshold
phy97.f <- prune_samples(sample_sums(phy97.f)>=sn, phy97.f)
phy97bs.f <- prune_samples(sample_sums(phy97bs.f)>=sn, phy97bs.f)
phy100.f <- prune_samples(sample_sums(phy100.f)>=sn, phy100.f)
  • Minimum count to retain OTU: 10
  • Minimum count to retain sample: 200

Descriptive statistics

Filtered dataset

# Compute summary statistics
stats97.f <- data.frame(`97% OTUs`=t(phystats(phy97.f)), check.names=F)
stats97bs.f <- data.frame(`97% within-sample OTUs`=t(phystats(phy97bs.f)), check.names=F)
stats100.f <- data.frame(`100% OTUs`=t(phystats(phy100.f)), check.names=F)

# Create and plot histograms
taxhist97 <- hist(log10(taxa_sums(phy97.f)), plot=F)
taxhist97bs <- hist(log10(taxa_sums(phy97bs.f)), plot=F)
taxhist100 <- hist(log10(taxa_sums(phy100.f)), plot=F)
samhist97 <- hist(log10(sample_sums(phy97.f)), plot=F)
samhist97bs <- hist(log10(sample_sums(phy97bs.f)), plot=F)
samhist100 <- hist(log10(sample_sums(phy100.f)), plot=F)

par(mfrow=c(2, 3), mar=c(3,3,1,1))
plot(taxhist97, col="black", main="97% OTU counts", xlim=c(0, 6), las=1, mgp=c(2,0.5,0),
     xlab="No. sequences (log10)", ylab="No. OTUs", cex.lab=0.75, cex.axis=0.75)
plot(taxhist97bs, col="black", main="97% OTU counts", xlim=c(0, 6), las=1, mgp=c(2,0.5,0),
     xlab="No. sequences (log10)", ylab="No. OTUs", cex.lab=0.75, cex.axis=0.75)
plot(taxhist100, col="black", main="100% OTU counts", xlim=c(0, 6), las=1, mgp=c(2,0.5,0),
     xlab="No. sequences (log10)", ylab="No. OTUs", cex.lab=0.75, cex.axis=0.75)
plot(samhist97, col="black", main="97% OTU reads per sample", xlim=c(0, 6), las=1, mgp=c(2,0.5,0),
     xlab="No. sequences (log10)", ylab="No. samples", cex.lab=0.75, cex.axis=0.75)
plot(samhist97bs, col="black", main="97% OTU reads per sample", xlim=c(0, 6), las=1, mgp=c(2,0.5,0),
     xlab="No. sequences (log10)", ylab="No. samples", cex.lab=0.75, cex.axis=0.75)
plot(samhist100, col="black", main="100% OTU reads per sample", xlim=c(0, 6), las=1, mgp=c(2,0.5,0),
     xlab="No. sequences (log10)", ylab="No. samples", cex.lab=0.75, cex.axis=0.75)

# Create stats table
knitr::kable(cbind(stats97.f, stats97bs.f, stats100.f))
97% OTUs 97% within-sample OTUs 100% OTUs
Total count in OTU table 1492454 1492337 944975
Number of OTUs 102 117 4727
Range of OTU counts 10 - 739699 0 - 474176 10 - 171442
Number of singleton OTUs 0 1 0
Number of samples 80 80 80
Range of reads per sample 706 - 169958 707 - 169964 485 - 97036
Arithmetic mean (±SD) reads per sample 18655 ± 21113 18654 ± 21114 11812 ± 12750
Geometric mean (±SD) reads per sample 13165 ± 2 13161 ± 2 8152 ± 2

Raw dataset

# Compute summary statistics
stats97 <- data.frame(`97% OTUs`=t(phystats(phy97)), check.names=F)
stats97bs <- data.frame(`97% within-sample OTUs`=t(phystats(phy97bs)), check.names=F)
stats100 <- data.frame(`100% OTUs`=t(phystats(phy100)), check.names=F)

# Create and plot histograms
taxhist97 <- hist(log10(taxa_sums(phy97)), plot=F)
samhist97 <- hist(log10(sample_sums(phy97)), plot=F)

taxhist97bs <- hist(log10(taxa_sums(phy97bs)), plot=F)
samhist97bs <- hist(log10(sample_sums(phy97bs)), plot=F)

taxhist100 <- hist(log10(taxa_sums(phy100)), plot=F)
samhist100 <- hist(log10(sample_sums(phy100)), plot=F)

par(mfrow=c(2, 3), mar=c(3,3,1,1))
plot(taxhist97, col="black", main="97% OTU counts", xlim=c(0, 6), las=1, mgp=c(2,0.5,0),
     xlab="No. sequences (log10)", ylab="No. OTUs", cex.lab=0.75, cex.axis=0.75)
plot(taxhist97bs, col="black", main="97% within-sample OTU counts", xlim=c(0, 6), las=1, mgp=c(2,0.5,0),
     xlab="No. sequences (log10)", ylab="No. OTUs", cex.lab=0.75, cex.axis=0.75)
plot(taxhist100, col="black", main="100% OTU counts", xlim=c(0, 6), las=1, mgp=c(2,0.5,0),
     xlab="No. sequences (log10)", ylab="No. OTUs", cex.lab=0.75, cex.axis=0.75)
plot(samhist97, col="black", main="97% OTU reads per sample", xlim=c(0, 6), las=1, mgp=c(2,0.5,0),
     xlab="No. sequences (log10)", ylab="No. samples", cex.lab=0.75, cex.axis=0.75)
plot(samhist97bs, col="black", main="97% within-sample OTU reads per sample", xlim=c(0, 6), las=1, mgp=c(2,0.5,0),
     xlab="No. sequences (log10)", ylab="No. samples", cex.lab=0.75, cex.axis=0.75)
plot(samhist100, col="black", main="100% OTU reads per sample", xlim=c(0, 6), las=1, mgp=c(2,0.5,0),
     xlab="No. sequences (log10)", ylab="No. samples", cex.lab=0.75, cex.axis=0.75)

# Create stats table
knitr::kable(cbind(stats97, stats97bs, stats100))
97% OTUs 97% within-sample OTUs 100% OTUs
Total count in OTU table 1492736 1492692 1056483
Number of OTUs 135 174 41449
Range of OTU counts 2 - 739709 2 - 474177 2 - 171487
Number of singleton OTUs 0 0 0
Number of samples 84 84 84
Range of reads per sample 3 - 169977 3 - 169975 1 - 112823
Arithmetic mean (±SD) reads per sample 17770 ± 20982 17770 ± 20982 12577 ± 14480
Geometric mean (±SD) reads per sample 9427 ± 5 9400 ± 5 6451 ± 6

Transform count data

Count data are transformed to both relative abundance (proportions) and square-root proportions for downstream statistical analyses.

# Convert to proportion (relative abundance)
phy97.f.p <- transform_sample_counts(phy97.f, function(x) x/sum(x))
phy97bs.f.p <- transform_sample_counts(phy97bs.f, function(x) x/sum(x))
phy100.f.p <- transform_sample_counts(phy100.f, function(x) x/sum(x))
# Apply transformation function
transform <- function(x) sqrt(x/sum(x))  # Set transformation function
phy97.f.t <- transform_sample_counts(phy97.f, transform)  # Transform data
phy97bs.f.t <- transform_sample_counts(phy97bs.f, transform) 
phy100.f.t <- transform_sample_counts(phy100.f, transform)

Symbiodinium in each coral

For each coral species, barplots are presented showing the relative abundance of OTUs obtained by 100%, 97%, and 97%-within-sample clustering. OTUs comprising more than 4% of a sample are labeled with the unique OTU number and the Symbiodinium subtype and NCBI GenBank accession number of the closest BLAST hit for that OTU in the reference database. OTU numbers and barplot colors are NOT comparable across the three clustering methods.

Diploria strigosa

Notice how 97% OTUs and 97% within-sample OTUs are NOT THE SAME! Specifically, notice how in the 97% OTU dataset, all samples except two are dominated by the same OTU, denovo235. In the 97% within-sample OTU dataset, those samples are dominated by three different OTUs – denovo73, denovo231, and denovo10. Look at the 100% OTU dataset to see the composition of unique sequence variants in these samples. Indeed, these samples are dominated by different sequence variants. However, in the 97% OTU approach, they have all been collapsed into the same OTU, while in the 97% within-sample clustering approach, samples with different dominant sequence variants are maintained as separate OTUs, even though these different sequence variants are more than 97% similar to each other. I suggest that the within-sample clustering approach is a better way of treating Symbiodinium ITS2 sequence data.

# Create subsetted phyloseq objects for Diploria strigosa
dstr97.f.p <- subset_samples(phy97.f.p, Species=="strigosa")
dstr97bs.f.p <- subset_samples(phy97bs.f.p, Species=="strigosa")
dstr100.f.p <- subset_samples(phy100.f.p, Species=="strigosa")
# Plot custom barplots for Diploria strigosa
par(mfrow=c(3,1), mar=c(4,4,2,0), mgp=c(0,0.5,0), lwd=0.1)
otubarplot(dstr97.f.p, main="97% OTUs")
otubarplot(dstr97bs.f.p, main="97% within-sample OTUs")
otubarplot(dstr100.f.p, main="100% OTUs")

dstr.net <- makenet(dstr97bs.f.p, 0)
set.seed(54538)
plotnet(dstr.net)

Network analysis for D. strigosa

Number of OTUs: 40

White circles represent coral samples, colored circles represent Symbiodinium OTUs (97% within-sample). Thickness of lines corresponds to the relative abundance of an OTU within a sample.


Porites furcata

# Create subsetted phyloseq objects for Porites furcata
pfur97.f.p <- subset_samples(phy97.f.p, Species=="furcata")
pfur97bs.f.p <- subset_samples(phy97bs.f.p, Species=="furcata")
pfur100.f.p <- subset_samples(phy100.f.p, Species=="furcata")
# Plot custom barplots for Porites furcata
par(mfrow=c(3,1), mar=c(4,4,2,0), mgp=c(0,0.5,0), lwd=0.1)
otubarplot(pfur97.f.p, main="97% OTUs")
otubarplot(pfur97bs.f.p, main="97% within-sample OTUs")
otubarplot(pfur100.f.p, main="100% OTUs")

# Network analysis
pfur.net <- makenet(pfur97bs.f.p, 0)
par(mar=c(0,0,0,0))
set.seed(54538)
plotnet(pfur.net)

Network analysis for P. furcata

Number of OTUs: 24

White circles represent coral samples, colored circles represent Symbiodinium OTUs (97% within-sample). Thickness of lines corresponds to the relative abundance of an OTU within a sample.

Orbicella annularis

# Create subsetted phyloseq objects for Orbicella annularis
oann97.f.p <- subset_samples(phy97.f.p, Species=="annularis")
oann97bs.f.p <- subset_samples(phy97bs.f.p, Species=="annularis")
oann100.f.p <- subset_samples(phy100.f.p, Species=="annularis")
# Plot custom barplots for Orbicella annularis
par(mfrow=c(3,1), mar=c(4,4,2,0), mgp=c(0,0.5,0), lwd=0.1)
otubarplot(oann97.f.p, main="97% OTUs")
otubarplot(oann97bs.f.p, main="97% within-sample OTUs")
otubarplot(oann100.f.p, main="100% OTUs")

# Network analysis
oann.net <- makenet(oann97bs.f.p, 0)
par(mar=c(0,0,0,0))
set.seed(54538)
plotnet(oann.net)

Network analysis for O. annularis

Number of OTUs: 6

White circles represent coral samples, colored circles represent Symbiodinium OTUs (97% within-sample). Thickness of lines corresponds to the relative abundance of an OTU within a sample.

Millepora alcicornis

# Create subsetted phyloseq objects for Millepora alcicornis
malc97.f.p <- subset_samples(phy97.f.p, Species=="alcicornis")
malc97bs.f.p <- subset_samples(phy97bs.f.p, Species=="alcicornis")
malc100.f.p <- subset_samples(phy100.f.p, Species=="alcicornis")
# Plot custom barplots for Millepora alcicornis
par(mfrow=c(3,1), mar=c(4,4,2,0), mgp=c(0,0.5,0), lwd=0.1)
otubarplot(malc97.f.p, main="97% OTUs")
otubarplot(malc97bs.f.p, main="97% within-sample OTUs")
otubarplot(malc100.f.p, main="100% OTUs")

# Network analysis
malc.net <- makenet(malc97bs.f.p, 0)
par(mar=c(0,0,0,0))
set.seed(54538)
plotnet(malc.net)

Network analysis for M. alcicornis

Number of OTUs: 18

White circles represent coral samples, colored circles represent Symbiodinium OTUs (97% within-sample). Thickness of lines corresponds to the relative abundance of an OTU within a sample.

Siderastrea siderea

# Create subsetted phyloseq objects for Siderastrea siderea
ssid97.f.p <- subset_samples(phy97.f.p, Species=="siderea")
ssid97bs.f.p <- subset_samples(phy97bs.f.p, Species=="siderea")
ssid100.f.p <- subset_samples(phy100.f.p, Species=="siderea")
# Plot custom barplots for Siderastrea siderea
par(mfrow=c(3,1), mar=c(4,4,2,0), mgp=c(0,0.5,0), lwd=0.1)
otubarplot(ssid97.f.p, main="97% OTUs")
otubarplot(ssid97bs.f.p, main="97% within-sample OTUs")
otubarplot(ssid100.f.p, main="100% OTUs")

# Network analysis
ssid.net <- makenet(ssid97bs.f.p, 0)
par(mar=c(0,0,0,0))
set.seed(834)
plotnet(ssid.net)

Network analysis for S. siderea

Number of OTUs: 16

White circles represent coral samples, colored circles represent Symbiodinium OTUs (97% within-sample). Thickness of lines corresponds to the relative abundance of an OTU within a sample.

Favia fragum

# Create subsetted phyloseq objects for Favia fragum
ffra97.f.p <- subset_samples(phy97.f.p, Species=="fragum")
ffra97bs.f.p <- subset_samples(phy97bs.f.p, Species=="fragum")
ffra100.f.p <- subset_samples(phy100.f.p, Species=="fragum")
# Plot custom barplots for Favia fragum
par(mfrow=c(3,1), mar=c(4,4,2,0), mgp=c(0,0.5,0), lwd=0.1)
otubarplot(ffra97.f.p, main="97% OTUs")
otubarplot(ffra97bs.f.p, main="97% within-sample OTUs")
otubarplot(ffra100.f.p, main="100% OTUs")

# Network analysis
ffra.net <- makenet(ffra97bs.f.p, 0)
par(mar=c(0,0,0,0))
set.seed(54538)
plotnet(ffra.net)

Network analysis for F. fragum

Number of OTUs: 14

White circles represent coral samples, colored circles represent Symbiodinium OTUs (97% within-sample). Thickness of lines corresponds to the relative abundance of an OTU within a sample.

Siderastrea radians

# Create subsetted phyloseq objects for Siderastrea radians
srad97.f.p <- subset_samples(phy97.f.p, Species=="radians")
srad97bs.f.p <- subset_samples(phy97bs.f.p, Species=="radians")
srad100.f.p <- subset_samples(phy100.f.p, Species=="radians")
# Plot custom barplots for Siderastrea radians
par(mfrow=c(3,1), mar=c(4,4,2,0), mgp=c(0,0.5,0), lwd=0.1)
otubarplot(srad97.f.p, main="97% OTUs")
otubarplot(srad97bs.f.p, main="97% within-sample OTUs")
otubarplot(srad100.f.p, main="100% OTUs")

# Network analysis
srad.net <- makenet(srad97bs.f.p, 0)
par(mar=c(0,0,0,0))
set.seed(54538)
plotnet(srad.net)

Network analysis for S. radians

Number of OTUs: 14

White circles represent coral samples, colored circles represent Symbiodinium OTUs (97% within-sample). Thickness of lines corresponds to the relative abundance of an OTU within a sample.

Porites astreoides

# Create subsetted phyloseq objects for Porites astreoides
past97.f.p <- subset_samples(phy97.f.p, Species=="astreoides")
past97bs.f.p <- subset_samples(phy97bs.f.p, Species=="astreoides")
past100.f.p <- subset_samples(phy100.f.p, Species=="astreoides")
# Plot custom barplots for Porites astreoides
par(mfrow=c(3,1), mar=c(4,4,2,0), mgp=c(0,0.5,0), lwd=0.1)
otubarplot(past97.f.p, main="97% OTUs")
otubarplot(past97bs.f.p, main="97% within-sample OTUs")
otubarplot(past100.f.p, main="100% OTUs")

# Network analysis
past.net <- makenet(past97bs.f.p, 0)
par(mar=c(0,0,0,0))
set.seed(54538)
plotnet(past.net)

Network analysis for P. astreoides

Number of OTUs: 8

White circles represent coral samples, colored circles represent Symbiodinium OTUs (97% within-sample). Thickness of lines corresponds to the relative abundance of an OTU within a sample.

Dendrogyra cylindrus

# Create subsetted phyloseq objects for Dendrogyra cylindrus
dcyl97.f.p <- subset_samples(phy97.f.p, Species=="cylindrus")
dcyl97bs.f.p <- subset_samples(phy97bs.f.p, Species=="cylindrus")
dcyl100.f.p <- subset_samples(phy100.f.p, Species=="cylindrus")
# Plot custom barplots for Dendrogyra cylindrus
par(mfrow=c(3,1), mar=c(4,4,2,0), mgp=c(0,0.5,0), lwd=0.1)
otubarplot(dcyl97.f.p, main="97% OTUs")
otubarplot(dcyl97bs.f.p, main="97% within-sample OTUs")
otubarplot(dcyl100.f.p, main="100% OTUs")

# Network analysis
dcyl.net <- makenet(dcyl97bs.f.p, 0)
par(mar=c(0,0,0,0))
set.seed(54538)
plotnet(dcyl.net)

Network analysis for D. cylindrus

Number of OTUs: 17

White circles represent coral samples, colored circles represent Symbiodinium OTUs (97% within-sample). Thickness of lines corresponds to the relative abundance of an OTU within a sample.

Montastraea cavernosa

# Create subsetted phyloseq objects for Montastraea cavernosa
mcav97.f.p <- subset_samples(phy97.f.p, Species=="cavernosa")
mcav97bs.f.p <- subset_samples(phy97bs.f.p, Species=="cavernosa")
mcav100.f.p <- subset_samples(phy100.f.p, Species=="cavernosa")
# Plot custom barplots for Montastraea cavernosa
par(mfrow=c(3,1), mar=c(4,4,2,0), mgp=c(0,0.5,0), lwd=0.1)
otubarplot(mcav97.f.p, main="97% OTUs")
otubarplot(mcav97bs.f.p, main="97% within-sample OTUs")
otubarplot(mcav100.f.p, main="100% OTUs")

# Network analysis
mcav.net <- makenet(mcav97bs.f.p, 0)
par(mar=c(0,0,0,0))
set.seed(54538)
plotnet(mcav.net)

Network analysis for M. cavernosa

Number of OTUs: 6

White circles represent coral samples, colored circles represent Symbiodinium OTUs (97% within-sample). Thickness of lines corresponds to the relative abundance of an OTU within a sample.

All species barplot

Here the composition of each sample is plotted as a horizontal bar, sorted by species and shore. Each segment of the bar represents an individual OTU and is colored by clade. This plot is a nice overview of the whole dataset but only provides coarse information at the clade level.

par(mfrow=c(1,1), mar=c(2, 1.5, 2, 5), lwd=0.1, cex=0.7)
# Plot composition of 97% within-sample OTUs colored by clade
composition(phy97bs.f.p, col=taxcolors[factor(data.frame(tax_table(phy97bs.f.p))[order(data.frame(tax_table(phy97bs.f.p))$Subtype), ]$Clade, levels=c("A","B","C","D","F","G"))], legend=T)

# Plot composition of 97% within-sample OTUs colored by OTU
#composition(phy97bs.f.p, col=rainbow(ntaxa(phy97bs.f.p)), legend=F)

All species network analysis

In this network, each coral species is collapsed into a single node (white squares), and connected to every Symbiodinium OTU (colored circles) that comprises at least 1% relative abundance within at least one individual of that species. Thus, very low abundance OTUs are not represented. The thickness of connections between coral species and OTUs is scaled by the relative proportion of samples of that species in which that OTU was present at ≥1% relative abundance (i.e., the frequency of presence at ≥1%). The size of Symbiodinium OTU nodes in the network is scaled by the number of species in which they occur, and colored according to clade.

# Get network with one node for each species.
# Edge weight is the mean relative abundance of OTU.
# Convert OTU table to "edges" table (weight = relative abundance)
otudf <- data.frame(t(otu_table(phy97bs.f.p)), check.names=F)
# Average relative abundance of OTU within species
#sp.ag <- aggregate(otudf, by=list(Species=data.frame(sample_data(phy97.f.p))$Species), FUN=mean)
# Percent of samples of species that contain OTU at more than 1% relative abundance
sp.ag <- aggregate(otudf, by=list(Species=data.frame(sample_data(phy97bs.f.p))$Species), 
                   FUN=function(x) length(x[x>0.01])/length(x))

rownames(sp.ag) <- sp.ag$Species
sp.ag <- sp.ag[, -1]
edges <- setNames(melt(as.matrix(sp.ag)), nm=c("source", "target", "weight"))  # Melt data
edges <- droplevels(edges[edges$weight>0, ])
# Create "nodes" table from sample and tax data
sdf <- data.frame(id=levels(data.frame(sample_data(phy97bs.f.p))$Species))
tdf <- data.frame(tax_table(phy97bs.f.p))[rownames(data.frame(tax_table(phy97bs.f.p))) %in% edges$target, ]
tdf$id <- rownames(tdf)
nodes <- merge(sdf, tdf, all=T)
nodes$type <- ifelse(is.na(nodes$Clade), 1, 2)
# Create network
net <- graph_from_data_frame(d=edges, vertices=nodes, directed=F)

# Best one for now ---------
V(net)$shape <- c("square", "circle")[factor(V(net)$type)]
V(net)$size <- ifelse(is.na(V(net)$Clade), 15, 10*sqrt(degree(net))) #5*sqrt(degree(net))
V(net)$label <- ifelse(is.na(V(net)$Clade), names(V(net)),  
                       unlist(lapply(strsplit(V(net)$Subtype, split="_"), "[", 1)))
V(net)$label <- str_replace_all(V(net)$label, c("alcicornis"="M.alc.", "annularis"="O.ann",
                                                "astreoides"="P.ast.", "cavernosa"="M.cav.",
                                                "cylindrus"="D.cyl.", "fragum"="F.fra.", 
                                                "furcata"="P.fur.", "radians"="S.rad.", 
                                                "siderea"="S.sid.", "strigosa"="D.str."))
V(net)$color <- ifelse(is.na(V(net)$Clade), "white",
                       taxcolors[factor(V(net)$Clade, levels=c("A","B","C","D","F","G"))])
E(net)$color <- "gray60"
E(net)$width <- 15 * E(net)$weight
set.seed(12347)
set.seed(12364)
set.seed(12374)
l <- layout.fruchterman.reingold(net)
l <- norm_coords(l, ymin=-1, ymax=1, xmin=-1, xmax=1)
par(mar=c(0,0,0,0))
plot(net, rescale=F, layout=l*1, edge.curved=0.1, vertex.label.cex=V(net)$size/20,
     vertex.label.color="black")

#legend(x=-1.05, y=1.1, c("CladeA", "CladeB", "CladeC", "CladeD", "CladeF", "CladeG"), pch=21,
#     col="#777777", pt.bg=taxcolors, pt.cex=1.1, cex=.6, bty="n", ncol=2, text.width=0.2)

Differences between species

Whether Symbiodinium communities differ among species is evaluated using permutational analysis of variance (PERMANOVA).

# PERMANOVA for difference among species
sppdiffs <- function(phy) {
  permanova.spp.t <- adonis(phyloseq::distance(phy, "bray") ~ Species, 
                          data=as(sample_data(phy), "data.frame"), permutations=9999)
  permanova.spp.t
  # Pairwise PERMANOVA tests for differences between species
  dmat <- as(phyloseq::distance(phy, "bray"), "matrix")
  df <- as(sample_data(phy), "data.frame")
  pairs <- data.frame(t(combn(levels(df$Species), 2)), stringsAsFactors=F)
  p.pairs <- setNames(rep(NA, nrow(pairs)), with(pairs, paste(X1, X2, sep='-')))
  for (i in 1:nrow(pairs)) {
    dfs <- subset(df, Species %in% c(pairs[i,1], pairs[i,2]))
    dmats <- dmat[rownames(dfs), rownames(dfs)]
    set.seed(152470)
    permanova <- adonis(dmats ~ Species, data=dfs, permutations=999)
    p.pairs[i] <- permanova$aov.tab$`Pr(>F)`[1]
  }
  # Return letters signifying differences 
  return(multcompLetters(p.pairs))
}

sppdiffs97 <- sppdiffs(phy97.f.t)
sppdiffs97bs <- sppdiffs(phy97bs.f.t)
sppdiffs100 <- sppdiffs(phy100.f.t)

sppdiffs <- data.frame("97% within-sample OTUs"=sppdiffs97bs$Letters,
                       "100% OTUs"=sppdiffs100$Letters, 
                       "97% OTUs"=sppdiffs97$Letters, check.names=F)
knitr::kable(sppdiffs, caption="Species not sharing a letter are significantly different (p < 0.05)")
Species not sharing a letter are significantly different (p < 0.05)
97% within-sample OTUs 100% OTUs 97% OTUs
alcicornis ab a a
annularis ab b bc
astreoides c c d
cavernosa d d e
cylindrus a e f
fragum ab f a
furcata e g b
radians f h g
siderea d d g
strigosa b b ac
Results:
  • At the 97% within-sample OTU scale, Symbiodinium communities are not significantly different in the corals M. alcicornis, O. annularis, and F. fragum, as these corals are all dominantly associated with the same Symbiodinium OTU (BLAST identity=B1). These corals are also not different from D. cylindrus or D. strigosa, which also dominantly associated with the same B1 OTU, although the latter two species are different from each other. Meanwhile, M. cavernosa and S. siderea also share similar symbiont communities dominated by Symbiodinium C3. Finally, S. radians, P. furcata, and P. astreoides each have distinct, significantly different Symbiodinium communities.
  • At the 100% OTU scale, Symbiodinium communities were significantly differentiated in all species except for O. annularis and D. strigosa (sharing B1-dominated communities), and S. siderea and M. cavernosa (sharing C3-dominated communities).

Differences between shores

Whether Symbiodinium communities differ between north vs. south shores within species is tested here using PERMANOVA.

97% within-sample OTUs

set.seed(43789)
set.seed(43789)
shorestats97bs <- perms(phy97bs.f.t, group="Species", trt="Shore")
knitr::kable(shorestats97bs, digits=3, row.names=F, 
             caption="97% within-sample OTU data: differences within and between shore for each species")
97% within-sample OTU data: differences within and between shore for each species
Species n overall within between R2 bd p
alcicornis 10 0.631 0.679 0.593 0.038 0.97 0.849
annularis 4 0.579 0.579 NaN NA NA NA
astreoides 5 0.057 0.057 NaN NA NA NA
cavernosa 7 0.015 0.016 0.014 0.163 0.39 0.670
cylindrus 8 0.132 0.138 0.127 0.136 0.54 0.620
fragum 8 0.271 0.327 0.223 0.086 0.48 0.786
furcata 9 0.712 0.559 0.835 0.360 0.81 0.047
radians 10 0.104 0.105 0.103 0.119 0.85 0.372
siderea 10 0.518 0.535 0.506 0.087 0.57 0.475
strigosa 9 0.814 0.757 0.859 0.228 0.23 0.092

100% OTUs

set.seed(43789)
shorestats100 <- perms(phy100.f.t, group="Species", trt="Shore")
knitr::kable(shorestats100, digits=3, row.names=F, 
             caption="100% OTU data: differences within and between shore for each species")
100% OTU data: differences within and between shore for each species
Species n overall within between R2 bd p
alcicornis 10 0.64 0.65 0.64 0.095 0.572 0.541
annularis 4 0.73 0.73 NaN NA NA NA
astreoides 5 0.41 0.41 NaN NA NA NA
cavernosa 7 0.46 0.45 0.46 0.188 0.949 0.293
cylindrus 8 0.43 0.38 0.47 0.212 0.029 0.063
fragum 8 0.55 0.57 0.53 0.129 0.296 0.508
furcata 9 0.70 0.55 0.81 0.446 0.386 0.013
radians 10 0.40 0.40 0.39 0.074 0.499 0.919
siderea 10 0.67 0.68 0.66 0.097 0.447 0.490
strigosa 9 0.81 0.76 0.85 0.222 0.043 0.094

97% OTUs

set.seed(43789)
shorestats97 <- perms(phy97.f.t, group="Species", trt="Shore")
knitr::kable(shorestats97, digits=3, row.names=F, 
             caption="97% OTU data: differences within and between shore for each species")
97% OTU data: differences within and between shore for each species
Species n overall within between R2 bd p
alcicornis 10 0.132 0.130 0.13 0.106 0.17 0.439
annularis 4 0.592 0.592 NaN NA NA NA
astreoides 5 0.075 0.075 NaN NA NA NA
cavernosa 7 0.127 0.125 0.13 0.259 0.23 0.084
cylindrus 8 0.126 0.105 0.14 0.271 0.20 0.037
fragum 8 0.221 0.262 0.18 0.083 0.40 0.845
furcata 9 0.538 0.374 0.67 0.532 0.20 0.052
radians 10 0.132 0.139 0.13 0.062 0.96 0.864
siderea 10 0.298 0.276 0.32 0.212 0.19 0.018
strigosa 9 0.417 0.407 0.42 0.213 0.26 0.323
Results:
  • P. furcata is the only species whose symbiont communities are significantly different between the north and south shores, at both the 97% within-sample OTU and 100% OTU scales.

Beta diversity

  • Beta diversity is evaluated here as the multivariate dispersion of samples within a coral species. Principal coordinate analysis on Bray-Curtis dissimilarities is used to calculate average distance-to-centroid values for each species group, which are then compared statistically by a permutation test. This analysis is implemented using betadisper() in the vegan package, based on Anderson (2006).
  • In this context, beta diveristy is analogous to ‘flexibility’ in symbiosis – how different can corals of the same species be in terms of their symbiont communities?
  • The species with highest beta diversity are D. strigosa, P. furcata, and M. alcicornis, followed by O. annularis and S. siderea. Species with the lowest beta diversity are F. fragum, D. cylindrus, S. radians, P. astreoides, and M. cavernosa.

97% within-sample OTUs

betad97bs <- betad(phy97bs.f.t, group="Species")
with(betad97bs$sambdsumm.ord, {
  plot(mean, type="n", main="97% within-sample OTUs",
       ylim=c(0, 0.75), ylab="Distance to centroid", xaxt="n", xlab="")
  arrows(1:10, mean - se, 1:10, mean + se, length=0.05, angle=90, code=3)
  points(1:10, mean, cex=1, pch=21, bg="white")
  text(1:10, par("usr")[3]-0.025, srt=45, adj=1, xpd=T, cex=0.75, 
       labels=levels(SAM$Species)[order(betad97bs$sambdsumm$mean, decreasing=T)])
  text(1:10, mean + se + 0.03, labels=betad97bs$saml$Letters[as.character(group)], cex=0.5)
})

100% OTUs

betad100 <- betad(phy100.f.t, group="Species")
with(betad100$sambdsumm.ord, {
  plot(mean, type="n", main="100% OTUs",
       ylim=c(0, 0.75), ylab="Distance to centroid", xaxt="n", xlab="")
  arrows(1:10, mean - se, 1:10, mean + se, length=0.05, angle=90, code=3)
  points(1:10, mean, cex=1, pch=21, bg="white")
  text(1:10, par("usr")[3]-0.025, srt=45, adj=1, xpd=T, cex=0.75, 
       labels=levels(SAM$Species)[order(betad100$sambdsumm$mean, decreasing=T)])
  text(1:10, mean + se + 0.03, labels=betad100$saml$Letters[as.character(group)], cex=0.5)
})

97% OTUs

betad97 <- betad(phy97.f.t, group="Species")
# Figure: Distance to centroids
with(betad97$sambdsumm.ord, {
  plot(mean, type="n", main="97% OTUs",
       ylim=c(0, 0.75), ylab="Distance to centroid", xaxt="n", xlab="")
  arrows(1:10, mean - se, 1:10, mean + se, length=0.05, angle=90, code=3)
  points(1:10, mean, cex=1, pch=21, bg="white")
  text(1:10, par("usr")[3]-0.025, srt=45, adj=1, xpd=T, cex=0.75, 
       labels=levels(SAM$Species)[order(betad97$sambdsumm$mean, decreasing=T)])
  text(1:10, mean + se + 0.03, labels=betad97$saml$Letters[as.character(group)], cex=0.5)
})

Main findings

  1. Symbiont communities are generally consistent with previous work on these species in the Caribbean, but high-throughput sequencing provides a much more complete picture of these communities’ composition.
  2. South shore P. furcata are more likely to have clade A instead of C, and S. siderea more likely to have clade D instead of C.
  3. Within-sample OTU clustering is more appropriate than clustering across samples.
  4. We can apply new types of analysis with this much data, including network analysis and statistical tests on beta diversity.